import os
os.listdir('/home/martin/example_files')['README.md', '.ipynb_checkpoints', 'banana.txt', 'apple.txt']
For today, a very common task in scientific programming that can be made much more robust with a bit of effort. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.
If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:
and I’ll send a single email at the end with links to them all.
Imagine we have a folder with a collection of files that we need to process in some way. Listing the contents is straightforward once we figure out the path. To keep the path relatively short, let’s say the folder is in my home directory:
['README.md', '.ipynb_checkpoints', 'banana.txt', 'apple.txt']
There are two text files that we want to process, and two files that we want to skip. If you’re used to using Windows machines then the path might look weird to you, but don’t worry about it for now - we will come back to that later.
The first thing that most beginners try is this:
processing README.md
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[7], line 3 1 for filename in os.listdir('/home/martin/example_files'): 2 print('processing ' + filename) ----> 3 file = open(filename) File ~/miniforge3/envs/ml/lib/python3.13/site-packages/IPython/core/interactiveshell.py:343, in _modified_open(file, *args, **kwargs) 336 if file in {0, 1, 2}: 337 raise ValueError( 338 f"IPython won't let you open fd={file} by default " 339 "as it is likely to crash IPython. If you know what you are doing, " 340 "you can use builtins' open." 341 ) --> 343 return io_open(file, *args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: 'README.md'
but we immediately run into an error. When we pass a filename to open, Python looks in the current working directory, which is not the location of our files.
At this point we realise that we have to construct the path to the file. We might try concatenation:
processing README.md
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[9], line 3 1 for filename in os.listdir('/home/martin/example_files'): 2 print('processing ' + filename) ----> 3 file = open('/home/martin/example_files' + filename) File ~/miniforge3/envs/ml/lib/python3.13/site-packages/IPython/core/interactiveshell.py:343, in _modified_open(file, *args, **kwargs) 336 if file in {0, 1, 2}: 337 raise ValueError( 338 f"IPython won't let you open fd={file} by default " 339 "as it is likely to crash IPython. If you know what you are doing, " 340 "you can use builtins' open." 341 ) --> 343 return io_open(file, *args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: '/home/martin/example_filesREADME.md'
This also causes an error, but if we look closely we will realise that we have forgotten the folder separator, which on my Linux system is /. So let’s add it to the end of the path:
processing README.md
processing .ipynb_checkpoints
--------------------------------------------------------------------------- IsADirectoryError Traceback (most recent call last) Cell In[10], line 3 1 for filename in os.listdir('/home/martin/example_files/'): 2 print('processing ' + filename) ----> 3 file = open('/home/martin/example_files/' + filename) File ~/miniforge3/envs/ml/lib/python3.13/site-packages/IPython/core/interactiveshell.py:343, in _modified_open(file, *args, **kwargs) 336 if file in {0, 1, 2}: 337 raise ValueError( 338 f"IPython won't let you open fd={file} by default " 339 "as it is likely to crash IPython. If you know what you are doing, " 340 "you can use builtins' open." 341 ) --> 343 return io_open(file, *args, **kwargs) IsADirectoryError: [Errno 21] Is a directory: '/home/martin/example_files/.ipynb_checkpoints'
The next error comes when we realise that we actually have another, hidden folder inside the folder that we want to process. In this case it’s a folder generated by Jupyter notebook itself. So we need to add some logic to skip it:
processing README.md
processing banana.txt
processing apple.txt
Now we nearly have some working code; we just need to add the logic to also skip the .md file:
processing banana.txt
processing apple.txt
Finally we have the logic that we want. In this case we have only two files, but real datasets will often have hundreds.
There are several problems with this code. Firstly, it’s a dangerous pattern to have the path hard-coded in two places; it makes it very easy to accidentally change one path but forget to change the other:
processing strawberry.txt
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[15], line 4 2 if not (filename.startswith('.') or filename.endswith('.md')): 3 print('processing ' + filename) ----> 4 file = open('/home/martin/example_files/' + filename) File ~/miniforge3/envs/ml/lib/python3.13/site-packages/IPython/core/interactiveshell.py:343, in _modified_open(file, *args, **kwargs) 336 if file in {0, 1, 2}: 337 raise ValueError( 338 f"IPython won't let you open fd={file} by default " 339 "as it is likely to crash IPython. If you know what you are doing, " 340 "you can use builtins' open." 341 ) --> 343 return io_open(file, *args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: '/home/martin/example_files/strawberry.txt'
which leads to bugs that can be hard to track down. We might fix this by making the path a variable:
processing banana.txt
processing apple.txt
which will also make it a bit easier to change. But we can’t guarantee that the eventual user of this code will include the trailing folder separator in the folder path. If they don’t:
processing banana.txt
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[17], line 6 4 if not (filename.startswith('.') or filename.endswith('.md')): 5 print('processing ' + filename) ----> 6 file = open(folder_path + filename) File ~/miniforge3/envs/ml/lib/python3.13/site-packages/IPython/core/interactiveshell.py:343, in _modified_open(file, *args, **kwargs) 336 if file in {0, 1, 2}: 337 raise ValueError( 338 f"IPython won't let you open fd={file} by default " 339 "as it is likely to crash IPython. If you know what you are doing, " 340 "you can use builtins' open." 341 ) --> 343 return io_open(file, *args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: '/home/martin/example_filesbanana.txt'
Then it will still work in the listdir, but will break for the open. So we might have to add it explicitly:
processing banana.txt
processing apple.txt
This makes the file path construction string more complicated and error-prone.
Another problem with the code is the complexity of the filename filtering. As a general rule, rather than try to list all of the possible patterns to skip, it’s more robust to do a positive selection:
processing banana.txt
processing apple.txt
However, an even better solution is to use glob from the standard library. The glob module and function takes care of listing the files in our target folder, filtering to give just the ones that we want, and constructing complete paths all in one go:
['/home/martin/example_files/banana.txt',
'/home/martin/example_files/apple.txt']
giving us a list of complete paths that we can plug straight into open:
processing /home/martin/example_files/banana.txt
processing /home/martin/example_files/apple.txt
As a nice side effect, we get much more useful debugging output that includes the exact path to the files that we will be processing. If you have used the Linux/Mac command line at all then you probably already know the special syntax that glob uses for specifying folder and file names, but if not then it’s easy to learn.
Once we have this pattern set up it’s easy to use it to do more complicated things, like list files in multiple folders:
['/home/martin/example_files/banana.txt',
'/home/martin/example_files/apple.txt',
'/home/martin/example_files2/strawberry.txt']
Bonus: the pathlib module from the standard library also contains many useful functions for path construction and manipulation.
One more time; if you want to see the rest of these little write-ups, sign up for the mailing list: